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An exactly solvable model based on the topology of a protein native state is applied to identify 
bottlenecks and key-sites for the folding of HIV-1 Protease. The predicted sites are found to correlate 
well with clinical data on resistance to FDA-approved drugs. It has been observed that the effects 
of drug therapy are to induce multiple mutations on the protease. The sites where such mutations 
occur correlate well with those involved in folding bottlenecks identified through the deterministic 
procedure proposed in this study. The high statistical significance of the observed correlations 
suggests that the approach may be promisingly used in conjunction with traditional techniques to 
identify candidate locations for drug attacks. 

INTRODUCTION 

One of the open fundamental questions in molecular biology is how to predict the folded state of a protein from 
the knowledge of its sequence. Despite a large increase in available computing power in the past years, it has been 
impossible to answer this question by means of computer simulations of various degrees of complexity and detail. 
However an increasing amount of experimental ||, ^, ^, ||] and theoretical results [|[ 0, ^, ^, |l^ supports the view 
that the folding of natural proteins into their native state is largely influenced by the native-state topology (for a 
brief review see @). Accordingly, the folding process is regarded as a well defined sequence of obligatory steps to 
be taken in order to reach the native state. Even if protein sequences have evolved to fold efficiently, the kinetics 
en-route to the native state might be hindered by the realization of particularly difflcult (rate-limiting) steps, such as 
the formation of non-local amino acid interactions (contacts) that usually requires the overcoming of large entropy 
barriers. Some non-local native contacts are rather crucial for the folding process, because their formation helps 
establishing further native interactions and leads to a rapid progress along the folding pathway until another barrier 
is met. Their formation is associated to bottlenecks for the entire folding process. Strikingly, the amino acids involved 
in such crucial contacts are those for which the largest changes in the folding kinetics are observed in site-directed 
mutagenesis experiments Q , as first proven for CI2 and Barnase |^ . This suggests that protein sequences have been 
carefully optimised so to exploit the conformational entropy reduction accompanying the folding process through 
the selection of the key amino acids. The number and importance of bottlenecks depends significantly on several 
factors. Among the most important are the contact order of the protein|^ and whether it folds in two or more 
stages p^. 



In previous studies |14, 15|, we have shown how the most delicate folding stages can be identified within a molecular 
dynamics approach, by monitoring the formation probability of native and non-native contacts from the unfolded to 
the native state. This can either be done as a function of time at a fixed temperature around the folding temperature 
or working at thermal equilibrium for a succession of decreasing temperatures (annealing). In principle, the two 
approaches need not to be equivalent but, for the quantities we have investigated, they give consistent results. Then, 
concerning the indentification of crucial contacts, one can safely concentrate on studying thermodynamic equilibrium 
at various temperatures. The main limitation of Molecular Dynamics (MD) and MonteCarlo (MC) simulations, 
especially for long protein chains, is that they are extremely time-demanding and plagued with statistical errors 
which can affect the predictions based on the study of the relative sensitivity of contact formation. Therefore it 
would be highly desirable to develop a suitable theoretical model, amenable to a deterministic (and computationally 
fast) treatment, thus resulting in a deeper understanding of the problem. Ideally, such a model should encompass 
all the "necessary ingredients" that usually are included in computer simulations: peptide chain constraints, effective 
interactions between residues, favourable monomeric positions, etc. In the following we describe a recently developed 
theoretical scheme that, while being very simplified and approximate compared to other schemes based on MD 
or MC simulations, can be treated analytically, leading to expressions that can be evaluated exactly. The calculated 
quantities rival those obtained through more sophisticated but computationally demanding MC and MD techniques. 
The purpose of the present paper is to show how the model can be employed to yield helpful observables to identify 
the folding bottlenecks. In particular we apply the method to the HIV-1 Protease (HIV-1 PR), an enzyme which 
is crucially involved in the HIV infection ||l^. In general, the accurate knowledge of bottlenecks has important 
pharmaceutical ramifications because their knowledge may be exploited in a rational drug design. Due to the large 
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amount of available clinical data, HIV-1 PR is a natural choice for a stringent test of our automated predictive scheme. 



THEORY 



The model we adopt builds on the importance of the native state topology in steering the folding process, that is 
in bringing into contact pairs of amino acids that are found in interaction in the native state. A primary quantity of 
interest that we shall calculate is the probability that a given native contact is established at a definite stage of the 
folding process. Probably, the oldest attempt to calculate such quantity dates back to Flory who tried to estimate the 
probability pij that two sites i and j in a long harmonic chain (the peptide) are in contact p8[ | . The approximation 
introduced by Flory was to neglect correlations between residues, which amounts to considering the chain embedded in 
a highly-dimensional space. As a result, the Py's are a decreasing function of the sequence separation \i — j\. Clearly, 
this approximation is not apt to pinpoint the key folding sites, since it exploits the native topology at the simplest 
level; in fact it takes into account only the contact order of native interactions. The Flory approach, however, can be 
refined by incorporating correlations between the formation of pairs, triplets etc. of contacts |2l[. Here we 

use a recently introduced energy function that allows to calculate the Py 's within a self-consistent analytic scheme. 
The strategy is similar in spirit to that of Go and Scheraga |^2| where only the formation of native interactions is 
energetically rewarded and is common to all recent approaches which exploits the native state topology j^, 0, |8[ ^, . 

We describe the proteins by the coordinates of the Cq atom of the i-th amino acids. The simplified energy 
functional for the chain of N residues is 

^ - ^ E (^M+i - r°.+r)^ + i E [(^^^ -4)'- ^']% (1) 

i=l i^j 

where K is the strenght of the peptide bonds, assumed to be harmonic, and T is the absolute temperature in units 
of the Boltzmann constant. 

The relative position between amino acid centroids is denoted by = r — and the corresponding native positions 
are indicated with the superscript 0. A is the contact matrix, whose element Ay is 1 if residues i and j are in contact 
in the native state (i.e. their Ca separation is below the cutoff c = 6.5 A) and otherwise. The matrix A^ along 
with the set encodes the topology of the protein. The factor 6*^ has the form 

e,,=eiR'-ir,,-rlf) (2) 

where Q{x) is the unitary step function and i? is a distance cutoff defining the range of the interaction between non 
consecutive amino acids. In standard off-lattice approaches, the interaction V{d) between non-bonded amino acids at 
a distance d, is taken to be a square well potential, or some type of Lennard- Jones interaction. Our choice in Eq. (|l|) 
is a sort of "harmonic well" which, while being physically sound and viable, is suitable for a self-consistent treatment, 
as explained below. The location of the outer rim of the well is controlled by R, which can be set to a few Angstroms 
(i? = 3 A in the present study) to penalise conformations where the the separation of two residues differs too much 
from the native one. In the native state each 9ij is close to 1 while in the denaturated state case 9ij are usually 
negligible. 

While the present form of the model does not accurately describe the effects of self-avoidance this does not lead to 
a qualitatively wrong behaviour in the highly-denatured ensemble (large T). The treatment of steric effects becomes 
progressively more accurate as temperature is lowered. In fact, the model guarantees that the native state is the 
true ground state and therefore protein conformations found at low temperature inherit the native self-avoidance. 
The connectedness of the chain, as well as its entropy, are captured in a simple but non-trivial manner. The most 
significant advantage of the model is that it can be used to explore the equilibrium thermodynamics without being 
hampered by inaccurate or sluggish dynamics. 

Two limit cases of model ^ are worthy of notice. In the absence of any bias towards the target structure (i.e. when 
both Aij and the (r^j's are removed) the model reduces to the standard gaussian polymer model whose behaviour is 
exactly known [|l8|, |2^. Furthermore, the limit when T ^ (when all native contacts are established and the bonded- 
energy term fluctuations are negligible) the model reduces to the gaussian network model that has been introduced 
and used to study the near-native vibrational properties of several proteins ^ ^ ^ . 

The thermodynamics of the model is fully determined by the partition function 

N 

Z{T)^ I l[d^r,exp{-H/T). (3) 

i— 1 
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In the integral of equation (g) and in the following, it is always meant that translational invariance is always 
explicitly broken by fixing, for example, the centre of mass of the system (see Appendix). 

The integral is still hard to treat analytically, due to the presence of non-quadratic interactions in the last term 
of Hamiltonian (mj. We thus perform a further, but non-trivial, simplification by replacing H with the variational 
hamiltonian Hq 

IV— ^ 

^0 = — E (^M+i - + 2 E [(r^^- - 4 )' - R'^P^^ (4) 

i=l i^j 

where the factors 6ij are now subtituted by parameters pij independent of the coordinates. Due to its quadratic form, 
the model described by Eq. (|^) can be solved with the standard techniques for Gaussian integrals. Such parameters 
have to be optimally determined so to ensure self-consistency: 

p.,,^mr,,-4r-R'))o. (5) 

The symbol (...)o indicates that the thermal averages are performed through the Hamiltonian Hq- Now in such self 
consistent approach the problem is fully solved and we can compute the resulting partition function from which we 
extract all the thermal properties and averages. In particular the logarithm of the partition funzion, ln(Z), has the 
following explicit expression: 

\n{Z) = ln(2^) - ^ In iV - ^ ln(dct ' M) + ^ ^ P^-^ (6) 

Im 

where the matrix AI is defined as: 

r Ki2 - S,,i - 6,.n) + 2 Am m/T for i = j 
= { (7) 
[ -2p,,j A,,j/r - K [5,^,+i + for i^j . 

and the prime in (H) denotes that the zero eigenvalue of M has to be omitted (see Appendix) . 

The quantities pij in Eq. (||) represent precisely the occurrence probability of a contact between residues i and j 
and indicate the frequency with which that native contact is established. At thermal equilibrium their dependence on 
temperature reflect the status of compactness of the protein molecule. For instance, well below the folding temperature, 
Tp, each pij(T) is expected to assume a value close to unity, as all native contacts are already formed. Instead, for 
temperatures much larger than Tp, all Pij{T) tend to be very small, reflecting the low propensity of the protein to 
establish contacts. 

Thermodynamics quantities can be easily derived from the p^'s. Another quantity necessary to characterize the 
folding transition is the specific heat, which exhibits one or more peaks in correspondence of significant structural 
rearrangements of the protein conformation. Since every energy change is mainly associated to the formation of native 
interactions, we address the question of which native contacts contribute mainly to the peak(s) of the specific heat. 
A clear answer to this question is readily found in the temperature behaviour of frequencies pij . Indeed, each pij (T) 
exhibits a sigmoidal shape, and the modulus of its derivative develops a sharp maximum in correspondence of the 
point of inflection (crossover temperature). The importance of every native contact i — j turns out to be characterized 
by the crossover temperature and the maximum slope of its pij , which can be regarded as an indicator of its degree 
of cooperativity. In fact, the most important contacts are those with high crossover temperature and associated high 
cooperativity. 

This fact allows a complete identification and classification of the bottlenecks, because we are now able to indentify 
those contacts that arc termodinamically relevant to peaks and shoulders of the specific heat. 



APPLICATION TO HIV-1 PROTEASE 



The human immunodeficiency virus (HIV) encodes a protease, HIV-1 PR, whose inhibition is crucial to prevent the 
maturation of infectious HIV particles . The role of the Protease in the infection spreading is to act as " molecular 
scissor" cleaving inactive viral polyproteins into smaller, functional proteins. In the presence of protease inhibitors. 
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viral particles are unable to mature and are rapidly cleared. Extensive clinical trials have lead to the development 
of five HIV-1 PR inhibitors approved by the Food and Drug Administration (FDA): Saquinavir mesylate (SAQ), 
Ritonavir (RIT), Indinavir sulfate (IND), Nelfinavir mesylate (NLF) and Amprenavir (APR) Such drugs are 

particularly effective in short-term treatments, while their long-term efficacy is limited by resistance. Indeed mutants 
resistant to protease- inhibitors can emerge in vivo already after less than one year ||T^ . Table | summarises the list 
of HIV-I PR known mutating sites causing drug resistance. 

In an earlier work, the study of the near-native harmonic vibrations of the HIV-I PR has shown that a number 
of sites that are paramount to the stability of the native enzyme are close to some of the residue of Table 1 [ p5[ . 
The self-consistent scheme of eqn. ^ allows to extend this result by modelling the partially-folded ensemble at finite 
temperature. 

In particular, we will be concerned in the characterization of such ensemble near the folding transition temperature. 
The motivation to do so stems from a recent study where we have shown that such mutating amino acids 
correspond, with high statistical significance, to sites involved in the folding kinetic bottlenecks. The rationale for 
this finding is that the most effective drugs can be eluded only by mutations occurring in correspondence of the key 
sites. Due to the sensitivity of the folded native conformation to these sites, only fine-tuned mutations are allowed in 
correspondence of these sites. Such mutation have to result in a native-like enzymatic activity and in the avoidance 
of the drug action. These constraint act as a severe selective pressure on the mutated proteases that the HIV virus 
is able to express. As a result, the mutations that will ultimately cause drug-resistence are expected to occur in 
correspondence of the crucial sites. These residues are heavily influenced by the native topology, and hence should 
display little dependence on the particular (effective) drug to be eluded. 

It is therefore our purpose to apply the scheme introduced in the previous section and identify the key residues within 
our topology-based scheme. The method, being completely analytic, is free from statistical uncertainty, common to 
all MC and MD simulation methods, or from difhculty (due to spatial restraints) to reach the target native state 
below the folding temperature. 



RESULTS AND DISCUSSION 



The structural model at the basis of our analysis is the free enzyme Jlj . It is a homodimer with C2 symmetry, 
each subunit being composed by 99 residues (Fig. 0). Previous studies [[l4[ have shown that geometrically important 
residue positions can be obtained considering a single monomer. Indeed the specific heat of the whole homodimer 
on decreasing the temperature shows a peak in correspondence of the folding of each sub-unit and then at lower 
temperature another peak signals the aggregation of the two sub-units. Thus, in the following, we will be concerned 
only with a single monomer. The specific heat is obtained through numeric differentiation of the average internal 
energy, which has the following explicit analytic expression in terms of the Py (r)'s and the quantities introduced 
before: 



, , 3 (iV - 1) T i?2 

ij 

The study of Go and Scheraga showed that systems described by energy-scoring-functions that reward the 
formation of native contacts display cooperative (all-or-none) folding transitions with an associated peak(s) in the 
specific heat. Consistently with these expectations, the specific heat calculated by differentiating Eq. (H) with respect 
to T shows a single peak, see Fig. |^, thus providing an unambiguous criterion for identifying the folding transition 
temperature, Tp- The width of the specific heat peak at the folding transition in Figure ^ is larger than the typical 
one found in experimental jl^ and theoretical studies [E9[ It is possible to enhance the cooperativity of the 
transition by intervening on the actual value of K in Eq. (|l|); in fact, a decrease of K leads to sharper transitions. An 
alternative criterion for fixing the value of K is provided by its influence on the average amount of native structure 
that is formed at the native state. Since we are particularly interested in monitoring the progressive establishment of 
native contacts, we adopted this second possibility to set the value of K. In fact, by choosing K = 1/15 in (|^), we 
ensure that, at Tp, the average fractional occupation of native contacts, q: 



j Pi, 3 



(9) 
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is about 50 % (see Fig. g), as established in several experiments and numerical studies. The primed summation 
symbol indicates that the sum is not carried out over consecutive pairs. The degree of native similarity, g is a useful 
overall indicator to monitor the progress towards the native state in a folding process ||3l|, While the ultimate 
quantities of interest are the Pij 's, it is useful to consider an intermediate level of description and focus on the whole 
network of contacts that a given site takes part to. A natural order parameter is provided by the "average environment 



formation" 133, 34 1 which, for a generic site i is defined as: 



= ^'^y P^-^ . (10) 

Pi is a measure of the fraction of established native contacts the i-th residue partecipate to (clearly, Pi is defined only 
when the denominator of eqn. |l^ is non-zero). The environment profiles for three different temperatures are shown 
in Fig. ||. The irregular behaviour of the profiles results from a complex interplay of the burial of the sites and the 
locality of their contacts. The hierarchical formation of secondary structures at high temperature is clearly visible. 
It is instructive to correlate the location of the sites known to cause resistance to drug treatments (see Table ^ with 
the features of the profiles. In particular, several mutating sites responsible for drug resistance (see Table can be 
found in correspondence of the peaks of the environments (see in particular sites 20,63,71,77,84). The most precise 
way to identify the key residues is, however, through the analysis of the fractional occupation of native contacts 
and not through the environments, since they only carry averaged information. Typical pij curves as a function of 
temperatures are shown in Fig. |[ 

As anticipated in section Theory, all Pij 's have monotonic sigmoidal shapes which mainly reflect the sequence 
separation, |i — j| and the native burial of each of the residues. In general, each contact is established at a different 
crossover temperature and with different intensity [l^ . The data relative to the frequencies of native- contact formation 
is conveniently summarised in the color-coded contact maps of Fig. pi A bright red color is used to highlight those 
contacts with the largest crossover temperatures above Tp, see Fig.]||a, or highest intensity in Fig. ||b. Both these 
intuitive notions can be used to identify the key folding contacts. The inspection of Fig. || reveals that several kinetic 
bottlenecks (red regions) are located three-four contacts downstream the three /3-turns in HIV-1 PR. In addition, 
the formation of contacts around residues 84 and 30, despite being so far away along the sequence, appears to be 
a crucial folding stage since it allows the collapse of the individual secondary structure motifs. It is striking that 
these results make an excellent parallel with those of Ref. [Q, where long and delicate MD simulations of the 
unfolding/refolding of HIV-1 PR were carried out using a much more sophisticated energy-scoring function. This 
provides a cross validation for the robustness of the results obtained both in the stochastic and the present, analytic, 
scheme. The emphasis is on the exactness of the present approach that allows to determine easily the Pij^s with an 
arbitrary accuracy. The absence of stochastic noise allows to compile Table ^ which shows the top contacts ranked 
according to crossover temperature and intensity. Sites that are known to cause drug resistance through mutations 
are highlighted in boldface. It is apparent that a high fraction of the top key folding contacts do, indeed, contain key 
mutating sites. To test the significance of such matches we compare the number of marked mutating sites contained 
in each column of Table II with the number of those contained in a randomly compiled table. We expect a random 
list of t elements extracted among TV, m of which are marked, to contain an average of tm/N marked elements with 
a square deviation of tm{N — m){N — t)/{N'^{N — 1)). For the case of HIV-1 PR the total number of contacts 
(excluding consecutive residues) within a cutoff radius of 6.5 A is TV = 180 and the number of those which include 
at least one known mutating site is m = 60. An analysis of the contacts of Table || (selected according to crossover 
temperature or cooperativity of formation) shows that the number of matches observed among the top sites tipically 
exceeds that expected from a random choice by a standard deviation (the precise amount depend of how many top 
ranking contacts are considered. An alternative and apparently more stringent approach is to identify independent 
groups of highly correlated contacts, and then search for the key residues in each group. To a first approximation, 
the correlated sets of interacting pairs may be identified with the clusters in the contact map. This leads to define 
six main groups, the three /3-sheets, the helix and the two sets of long-range contacts, around contacts 14-60 and 
23-84, respectively (see Fig. ^). The four contacts in each group with the highest intensity of formation above Tp are 



summarised in Table [II. Out of the 24 contacts, 12 of them involve a key site, which is two standard deviations away 
from the number of matches expected on a random basis (7.9 ± 2.1). Again, this testifies both the reliability of the 
general scheme followed here and also its robustness in the different possible implementations. 

Interestingly, the results of Table [II account better than those of Table || for the heterogeneous location of the 
key folding sites. The emerging conclusion is that a complete description of the crucial contacts can be obtained 
only by monitoring all the key stages of the folding process. In standard MC and MD simulations of protein unfold- 
ing/refolding, it is the simulated dynamics that reveals which, and how many, delicate stages exists. In the present 
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approach, the folding process is characterised analytically, thus the complete set of folding bottlenecks follows from 
the study of distinct groups of interrelated contacts. 

Finally, we remark that the determination of the key contacts does not uniquely provide the key folding sites, since 
two sites are involved in each pairwise contact. This ambiguity can, in several cases, be resolved either by selecting 
those sites that take part in several crucial contacts, or by examining their distribution on the three-dimensional 
native structure for clues that may help breaking the ambiguity. 



CONCLUSIONS 



We have used an analytical technique to study and characterize the folding process of globular proteins. This 
deterministic method allows the automated identification of contacts involved in folding rate-limiting steps. As a 
result, the whole folding process is particularly sensitive to mutations occuring at sites involved in such crucial 
contacts. We test our scheme and its usefulness in pinpointing the crucial sites by applying it to HIV-1 protease. 
For this enzyme, extensive clinical trials have allowed the identification of several sites involved in drug-resistance 
mutations. Such sites have a meaningful overlap with the key folding sites predicted by our scheme with a modest 
computational effort compared to more sophisticated stochastic simulations techniques. This indicates that the 
available inhibiting drugs are quite effective since they can be eluded only by mutations of the (sensitive) key sites of 
the protease. 

The proposed approach to identify the crucial residues is quite general and ought to be useful to identify the kinetic 
bottlenecks of other viral enzymes of pharmaceutical interest, thus aiding the development of novel effective inhibitors. 

We expect to focus our future efforts on improving the present approach by taking into account the propensities 
of different amino acids to form contacting pairs. This limitation can be overcome by introducing physically viable 



(attractive) pairwise interactions 36, 37|, In the present approach this possibility was deliberately avoided 

to highlight the influence of the native state topology alone on the kinetic bottlenecks, irrespective of the different 
chemical nature and strength of the effective amino acid interactions. We expect that the inclusion of such effects, 
while not distorting the overall picture presented here, may change the relative strength of spatially-close contacts. 
This may improve the agreement between Table | and tables by resolving those cases were a site adjacent to a 

mutating one is selected. 

We are indebted to Paolo Carloni for several illuminating discussions and for having stimulated the present work. 
This work was supported by INFM, Murst Cofin2001. 



APPENDIX 



In this appendix we discuss how the translational invariance of a quadratic energy scoring function can be explicitly 
broken by fixing the center of mass of the system in the origin. The constrained partition function is written as: 

Z^jYld'x, e-^/^S...^'-^-^^<53(^x,) (11) 

i—l i 

where the matrix A incorporates the quadratic dependence of Hq in eqn (^) from the space co-ordinates (and also 
includes the 1/T factor to yield the usual Boltzmann weight). The translational invariance of Hq implies that A 
satisfies the property: X]j — which amounts to say that the uniform vector, Vi = N'^ (1, 1, 1, 1..., 1) is an 
eigenvector of A with eigenvalue Ai = 0. We assume that Hq is invariant only for the simultaneous translation of all 
the coordinates, {x^}. In this case all other eigenvalues, {Ai>i} are strictly positive and the corresponding eigenvectors 
Vi>i are all orthogonal to the zero mode vi. 
By rewriting the Dirac-delta constraint as 

5^{z) = lim f^)%-- W2 (12) 
c-^oo \ 27r/ 

the partition function takes on the form Z — limc^oo Zc, where 



3 N 

^-(a'/n"'-.'-"^^--^""-. 

i—l 



(13) 
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where A'^j = Aij + c. It is straightforward to see that A' admits the same eigenvectors of A. Only the zero mode 
eigenvalue will change from zero to cN, while all others will be unmodified. Upon performing the Gaussian integrations 
in Zc we obtain: 

1 3(1V-1) _ 3 

This shows that Zc is effectively independent of c and, therefore, the partition function Z simplifies to 



Z = N-i (27r)^^ (det'A) ' , (15) 
where the prime denotes that the determinant is calculated omitting the zero mode eigenvalue. 
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Drug 


Point Mutations 


RTN io[0 


20,33,35,36,46,54,63,71,82,84,90 


NLF jy 


30,46,63,71,77,84, 




10,32,46,63,71,82,84 


SQV 


10,46,48,63,71,82,84,90 


APR Q 


46,63,82,84 



TABLE I: Mutations in the protease associated with FDA-approved drug resistance 



Crossover Temperature 


Cooperativity 


25 - 86 


14 - 66 


28 - 86 


14 - 64 


58 - 76 


10 - 23 


58 - 77 


14 - 65 


57 - 77 


13 - 66 


13 - 66 


12 - 66 


30 - 86 


87 - 91 


32 - 84 


13 - 65 


32 - 76 


23 - 84 


29 - 86 


10 - 22 


31 - 84 


56 - 77 


23 - 84 


57 - 77 


14 - 66 


23 - 83 


25 - 85 


22 - 84 


14 - 65 


57 - 78 


45 - 56 


86 - 89 


89 - 91 


34 - 78 


13 - 65 


58 - 77 


87 - 89 


30 - 88 


84 - 86 


32 - 75 


56 - 58 


32 - 76 


25 - 84 


31 - 76 


86 - 88 


42 - 58 


64 - 71 


90 - 94 


57 - 76 


87 - 90 



TABLE II: The top contacts ranked according to the crossover temperature (first column) and cooperativity of formation above 
Tf (second column) 
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Bottlenekcs 


Key Contacts 


/3i 


10 - 23 


/3i 


10 - 22 


Pi 


14 - 20 


Pi 


12 - 20 


02 


42 - 58 


P2 


45 - 58 


P2 


43 - 58 


P2 


43 - 57 


Pz 


56 - 77 


P3 


57 - 77 


P3 


58 - 77 


P3 


57 - 76 


Otherl 


14 - 66 


Other 1 


14 - 64 


Otherl 


14 - 65 


Otherl 


13 - 66 


Other2 


23 - 84 


Other2 


23 - 83 


Other2 


22 - 84 


Other2 


30 - 88 


Hehx 


87 - 91 


Hehx 


86 - 89 


Hehx 


90 - 94 


Hehx 


87 - 90 



TABLE HI: The four contacts with the highest cooperativity of formation above Tf for each of the six clusters of the contact 
map. 




FIG. 1: Structure of HIV-1 PR dimer [g/7|. The highlighted locations indicate residues where mutations causing drug-resistance 
are observed. 
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FIG. 2: Specific lieat and overlap of a monomer of the HIV-1 PR. The temperature is scaled with the temperature Tf where 
the specific heat peak occurs 




20 40 60 80 



site 



FIG. 3: Plot of Pi, the degree to which amino acid i is in a native-like conformation, versus i. In ascending order the curves 
are calculated at T/TF= 1.5, 1.0 and 0.5. The bar at the bottom shows the secondary structure associated with amino acid i. 
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FIG. 5: Color-coded contact map of HIV-1 PR monomer, (a) Contacts with a large [small] crossover temperature are shown 
in red [blue].(b) Contacts with a large [small] cooperativity of formation above Tf are shown in red [blue]. 



